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Abstract 

We present efficient algorithms for computing a maximum agreement forest (MAF) of a 
pair of multifurcating (nonbinary) rooted trees. Our algorithms match the running times of 
the currently best algorithms for the binary case. The size of an MAF corresponds to the 
subtree prune-and-regraft (SPR) distance of the two trees and is intimately connected to their 
hybridization number. These distance measures are essential tools for understanding reticulate 
evolution, such as lateral gene transfer, recombination, and hybridization. Multifurcating trees 
arise naturally as a result of statistical uncertainty in current tree construction methods. 



1 Introduction 

Phylogenetic trees are the standard model for representing the evolution of a set of species (taxa) 
through "vertical" inheritance [IJ. Yet, genetic material can also be shared between contemporary 
organisms via lateral gene transfer, recombination or hybridization. These processes allow species to 
rapidly adapt to new environments as shown by, for example, the rapid spread of antibiotic resistance 
and other harmful traits in pathogenic bacteria |2j. Untangling vertical and lateral evolutionary 
histories is thus both difficult and of great importance. To do so often requires the comparison of 
phylogenetic trees for individual gene histories with a reference tree. Distance measures that model 
reticulation events using subtree prune-and-regraft (SPR) [1] and hybridization [3] operations are of 
particular interest in such comparisons due to their direct evolutionary interpretations 
These distance measures are biologically meaningful but also NP-hard to compute 
result, there has been significant effort to develop efficient fixed-parameter |8}]12| and approxima- 
tion [8 13 ( 14j algorithms to compute these distances, most of which use the equivalent notion of 



As a 



maximum agreement forests (MAFs) [3j[5 1 5 . Efficient algorithms for computing these distances 



have generally been restricted to binary trees. The exceptions are reduction rules for computing 
hybridization numbers of nonbinary trees [16] and a recent depth-bounded search algorithm [IT] for 
computing the subtree prune-and-regraft distance of such trees. 

Multifurcations (or polytomies) are vertices of a tree with two or more children. A multifurcation 
is hard if it indeed represents an inferred common ancestor which produced three or more species 
as direct descendants; it is soft if it simply represents ambiguous evolutionary relationships |18|. 
Simultaneous speciation events are assumed to be rare, so a common assumption is that all 
multifurcations are soft. If we force the resolution of multifurcating trees into binary trees, then 
we infer evolutionary relationships that are not supported by the original data and may infer 
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Figure 1: (a) An A-tree T. (b) The subtree T(V) for V = {1,2,4}. (c) T\V. (d) A binary resolution 
of T. (e) Illustration of an SPR operation applied to the binary resolution of T. 



meaningless reticulation events. Thus, it is crucial to develop efficient algorithms to compare 
multifurcating trees directly. 

In this paper, we extend the fastest approximation and fixed-parameter algorithms for computing 
MAFs of binary rooted trees to multifurcating trees (thus showing that computing MAFs for 
multifurcating trees is fixed-parameter tractable). The size of an MAF of two binary trees is 
equivalent to their SPR distance. In keeping with the assumption that multifurcations are soft, we 
define an MAF of two multifurcating trees so that its size is equivalent to what we call the soft 
SPR distance: the minimum number of SPR operations required to transform a binary resolution of 
one tree into a binary resolution of the other. This distinction avoids the inference of meaningless 
differences between the trees that arises, for example, when one tree has a set of resolved bifurcations 
that are part of a multifurcation in the other tree. This is similar to the extension of the hybridization 



number to multifurcating trees by Linz and Semple 16 



Our fixed-parameter algorithm achieves the same running time as in the binary case. Our 
approximation algorithm achieves the same approximation factor as in the binary case at the cost of 
increasing the running time from linear to O(nlgn). These results are not trivial extensions of the 
algorithm for binary trees. They require new structural insights and a novel method for terminating 
search branches of the depth-bounded search tree, coupled with a careful analysis of the resulting 
recurrence relation. 

The rest of this paper is organized as follows. Section [2] introduces the necessary terminology 
and notation. Section [3] presents the key structural results for multifurcating agreement forests. 
Section [4] presents our new algorithms based on these results. Finally, in Section [5j we present 
closing remarks and discuss open problems and possible extensions of this work. 
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Figure 2: (a) SPR operations transforming the tree T from Figure l(a)| into the second tree in 
Figure 1(e) Each operation changes the top endpoint of one of the dotted edges. The hard 
SPR distance between the two trees is 2. (b) The MAF representing only the first transfer of (a) 
(equivalently, Figure 1(e)). The second transfer is unnecessary if the multifurcation represents 
ambiguous data rather than simultaneous speciation. Thus, the soft SPR distance is 1. 



2 Preliminaries 

Throughout this paper, we mostly use the definitions and notation from (5j[l3}[l4|[T6j[l9}[20] . A 
(rooted phylogenetic) X-tree is a rooted tree T whose leaves are the elements of a label set X and 
whose non-root internal nodes have at least two children each; see Figure |l(a)| T is binary (or 
bifurcating) if all internal nodes have exactly two children each, otherwise it is multifur -eating. The 
root of T has label p and has one child. Throughout this paper, we consider p to be a member of X. 



For a subset V of X, T(V) is the smallest subtree of T that connects all nodes in V; see Figure 1(b) 



The V -tree induced by T is the smallest tree T\V that can be obtained from T(V) by contracting 
unlabelled nodes with only one child, that is, by merging each such node with one of its neighbours 



and removing the edge between them. See Figure 1(c) An expansion does the opposite: It splits a 



node v into two nodes v± and vi such that v\ is V2 , s parent and divides the children of v into two 
subsets that become the children of v\ and V2, respectively. For brevity, we refer to this operation 
as expanding the subset of v's children that become i^'s children. 

Let T\ and T% be two X-trees. We say that T<i resolves T\ or, equivalently, T2 is a resolution of 
T\ if T\ can be obtained from T2 by contracting internal edges. T2 is a binary resolution of T\ if T2 



is binary. See Figure |l(d) 



A subtree prune- and-regraft (SPR) operation on a binary rooted X-tree T cuts an edge xp x , 
where p x denotes the parent of x. This divides T into subtrees T x and T Px containing x and p x , 
respectively. Then it introduces a node p' x into T Px by subdividing an edge of T Px and adds an 



edge xp' x , thereby making x a child of p' x . Finally, p x is removed using a contraction. See Figure [1(e) 



On a multifurcating tree, an SPR operation may also use any existing node of T Px as p' x and contracts 
p x only if it has only one child besides x. 

SPR operations give rise to a distance measure dspn(-, •) between binary X-trees, defined as 
the minimum number of such operations required to transform one tree into the other. The trees 
in Figure |T(e") for example, have SPR distance dspR{T\,T2) = 1. An analogous distance measure, 
which we call the hard SPR distance, could be defined for multifurcating X-trees; however, under the 
assumption that most multifurcations are soft, this would capture differences between the trees that 
are meaningless. Instead, we define the soft SPR distance d s sPR(Ti,T2) between two multifurcating 
trees T\ and T2 to be the minimum SPR distance of all pairs of binary resolutions of T\ and T2 Q 
For simplicity, we simply refer to this as the SPR distance in the remainder of this paper. These 
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two distance measures are illustrated in Figure [2] Note that the soft SPR distance is not a metric 
but captures the minimum number of SPR operations needed to explain the difference between the 
two trees. 

These distance measures are related to the sizes of appropriately defined agreement forests. To 
define these, we first introduce some terminology. For a forest F whose components Ti, T2, . . . , Tk 
have label sets X\, X2, ■ ■ ■ , X^, we say F yields the forest with components Ti|Xi, T2IX2, . . . , T^X^; 
if X{ = 0, then Tj(Xj) = and, hence, Tj|Xj = 0. For a subset E of edges of F, we use F — E 
to denote the forest obtained by deleting the edges in E from F, and F 4- E to denote the forest 
yielded by F — E. Thus, F -j- E is the contracted form of F — F. We say F 4- F is a forest of F. 

Given X-trees T\ and T2 and forests Fi of T\ and F2 of T2 , a forest F is an agreement forest 
(AF) of F\ and F2 if it is a forest of a binary resolution of Fi and of a binary resolution of 
F2. F is a maximum agreement forest (MAF) of F\ and F2 if there is no AF of F\ and F2 
with fewer components. An MAF of the trees from Figure [2(a) is shown in Figure 2(b). We 



denote the number of components in an MAF of F\ and Fi by m(Fi, F2), and the size of the 
smallest edge set E such that F' 4- E is an AF of F\ and F2 by e(Fi, F2, F), where F is a 
forest of F2 and F' is a binary resolution of F. Bordewich and Semple [5j showed that, for two 
binary rooted X-trees T\ and T2, oIspr{Ti,T2) = e(Ti,T2,T2) = m(Ti,T2) — 1. This implies that 
d s SPR,(Ti,T2) = e(Ti, T2, T2) = m(T\, T2) — 1 for two arbitrary rooted X-trees because d s spR,(Ti,T2), 
e(Ti,T2,T2), and m(Ti,T2) are taken as the minimum over all binary resolutions of T\ and T2. 
Thus, to determine the SPR distance between two rooted X-trees, we need to compute a binary 
MAF of the two trees. 

We write a b when there exists a path between two nodes a and b of a forest F. For a node 
x of F, F x denotes the subtree of F induced by all descendants of x, inclusive. For two rooted 
forests Fi and F2 and a node a G Fi, we say that a exists in F2 if there exists a node a' in F2 such 
that Ff = F£ ■ For simplicity, we refer to both a' simply as a. For forests F± and F2 and nodes 
a, c G F\ with a common parent, we say {a, c} is a sibling pair of Fi if a and c exist in Fj. Figure [3] 
shows such a sibling pair. We say {ai, 02, ... , a m } is a sibling group if {a^, Oj} is a sibling pair of 
Fi, for all 1 < « < j < m, and ai has no sibling not in the group. 

The correctness proofs of our algorithms in the next sections make use of the following three 
lemmas. Lemma [I] was shown by Bordewich et al. [l4j for binary trees. The proof trivially extends 
to multifurcating trees. 

Lemma 1. Let F be a forest of an X-tree, e and f edges of F, and E a subset of edges of F such 
that f G E and e ^ E. Let Vf be the end vertex of f closest to e, and v e an end vertex of e. If (1) 
Vf ~f-_b v e and (2) x ^°F-(Eu{e}) v f> f or oil x G X, then F J ^E = F J ^(E\ {/} U {e}). 

Let Fi and F2 be forests of X-trees T\ and T2, respectively. Any agreement forest of F\ and F2 
is clearly also an agreement forest of T\ and T2 . Conversely, an agreement forest of T\ and T2 is an 
agreement forest of F\ and F2 if it is a forest of F2 and there are no two leaves a and b such that 
a ~i? 2 b but a o°f 1 b. This is formalized in the following lemma. Our algorithms ensure that any 
intermediate forests Fi and F2 they produce have this latter property. Thus, this lemma allows us 
to reason about agreement forests of F\ and F2 and of T\ and T2 interchangeably, as long as they 
are forests of F2. 

Lemma 2. Let F\ and F2 be forests of X -trees T\ and T2, respectively. Let F\ be the union of trees 
Ti, T2, . . . , T/% and F2 be the union of forests F\, F2, ■ . ■ , F& such that Ti and Fi have the same label 
set, for all 1 < i < k. Let F' 2 be a resolution 0/F2. F' 2 ^r E is an AF ofT\ and T2 if and only if it 
is an AF of F\ and F2. 



4 



1 2 

■Pi F2 



Figure 3: A sibling pair {a, c} of two forests F\ and F2: a and c have a common parent in F\, and 
both subtrees .Ff and F? exist also in F2. 



To use Lemma [T] to prove structural properties of agreement forests, which are defined in terms 
of resolutions of forests, we also need the following lemma, which specifies when an expansion does 



not change the SPR distance. Its proof is provided in Section S6.1 in the supplementary material 



Lemma 3. Let F\ and F2 be resolutions of forests of rooted X -trees T± and T2, and let F -f- 
E be a maximum agreement forest of F± and F2, where F is a binary resolution of F2. Let 
at, a,2, ■ ■ ■ , a p , CLp+i, ■ ■ ■ , a m be the children of a node in F2 and let be the result of expanding 
{dp+l, ap+2, • • • , a-m} in F2. If a\ o^f^e a'j, for all 1 < i < p, p + 1 < j < m, and all leaves a\ G F^' 
and a) £ F^ 3 , then e(F 1 ,F 2 ,F 2 ) = e(F 1 ,F 2 ,F%). 

A triple ab\c of a rooted forest F is defined by a set {a, b, c} of three leaves in the same component 
of F and such that the path from a to b in F is disjoint from the path from c to the root of the 
component. Multifurcating trees also allow for triples a\b\c where a, b, and c share the same lowest 
common ancestor (LCA). A triple ab\c of a forest F\ is compatible with a forest F2 if it is also a 
triple of F2 or F 2 contains the triple a\b\c; otherwise it is incompatible with F2. 

An agreement forest of two forests F\ and F2 cannot contain a triple incompatible with either of 
the two forests. Thus, we have the following observation. 

Observation 1. Let F\ and F 2 be forests of rooted X-trees T\ and T2, and let F be an agreement 
forest of Fx and F 2 . If ab\c is a triple of F\ incompatible with F2, then a oo F b or a oo F c. 

For two forests F\ and F2 with the same label set, two components C\ and C2 of F 2 are said to 
overlap in F\ if there exist leaves a, b £ C\ and c, d G C2 such that the paths from a to b and from c 
to d in Fx exist and are not edge-disjoint. The following lemma is an easy extension of a lemma 



of 14 , which states the same result for binary trees instead of binary forests. 



Lemma 4. Let F\ and F 2 be binary resolutions of forests of two X-trees T\ and T2, and denote 
the label sets of the components of F\ by X\,X 2 -, ■ ■ ■ , X^ and the label sets of the components of F 2 
by Yi , Y2 , . ■ ■ , Y\ . F2 is a forest of F\ if and only if (1) for every Yj , there exists an Xi such that 
Yj C Xi, (2) no two components of F2 overlap in F\, and (3) no triple of F2 is incompatible with 
Fx. 



3 The Structure of Multifurcating 
Agreement Forests 

This section presents the structural results that provide the intuition and formal basis for the 
algorithms presented in Section [41 All these algorithms start with a pair of trees (Ti, T 2 ) and 
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Figure 4: Tree labels for a sibling group {a%, 02, • • ■ , dm} such that 01, 02, • • ■ , a r share a minimal 
LCA I. 



then cut edges, expand sets of nodes, remove agreeing components from consideration, and merge 
sibling pairs until the resulting forests are identical. The intermediate state is that T\ and T2 have 
been resolved and reduced to forests F% and F2 , respectively. F\ consists of a tree T\ and a set of 
components Fq that exist in F<i- Fi has two sets of components. One is Fq. The other, F2, has 
the same label set as T\ but may not agree with T\. The key in each iteration is deciding which 
edges in F2 to cut next or which nodes to expand, in order to make progress towards an MAF of T% 
and T2. The results in this section identify small edge sets in F2 such that at least one edge in each 
of these sets has the property that cutting it reduces e(Ti, T2, F2) by one. Some of these edges are 
introduced by expanding nodes. The approximation algorithm cuts all edges in the identified set, 
and the size of the set gives the approximation ratio of the algorithm. The FPT algorithm tries each 
edge in the set in turn, so that the size of the set gives the branching factor for a depth-bounded 
search algorithm. 

Let {ai, 02, . . . , a m } be a sibling group of T±. If there exist indices i 7^ j such that a, and aj are 
also siblings in F2, we can expand this sibling pair {aj, aj} and replace aj and aj with their parent 
node (ai,a,j) in the sibling group. If there exists an index i such that F£* is a component of F2, 
then we can cut a^s parent edge in F±, thereby removing m from the sibling group. Thus, we can 
assume aj and aj are not siblings in F2, for all 1 < i < j < m, and F^ 1 is not a component of F2, 
for all 1 < i < m. We have ai 6 F2, for all 1 < i < m, because T\ and F2 have the same label set. 
Let Bi = {bn, bi2, ■ ■ ■ h qi } be the siblings of aj in F2, for 1 < i < m. We use e x to denote the edge 
connecting a node x to its parent p x , to denote the edge introduced by expanding Bi, and ps t 
to denote the common parent of the nodes in Bi. F2 — {ee t } denotes the forest obtained from F2 
by expanding Bi and then cutting e ^ , and we use F 2 i to denote the subforest of F2 comprised of 

the subtrees F^ 1 , F^ 2 F^ m . 

Consider a subset {a^ , aj 2 , . . . , aj r } of a sibling group {ai, 02, • • . , a m }. We say a^ , Oj 2 , . . . , c^,. 
share their LCA I if / = LCAp 2 (ai,aj), for all i,j € {ii,i2> ■ ■ ■ , v}> * 7^ j- If, in addition, 
LCAp 2 (ai,aj) is not a proper descendant of Z, for all 1 < i < j < m, we say that 0^,0^, . . . ,ai r 
share a minimal LCA I. For simplicity, we always order the elements of the group so that 
{a^, aj 2 , . . . , Oj r } = {ai, 02, . . . , a r } and assume the subset that shares I is maximal, that is, aj is 
not a descendant of Z, for all r < i < m. We use i?; to denote the set of children of I that do not 
have any member a« of the sibling group as a descendant. Note that Bi C Bi when a« is a child of I. 
These labels are illustrated in Figure |4j 

Our first result shows that at least one of the edges e Ql , e a2 , , and e b 2 has the property that 
cutting it reduces e(Ti,T2, F2) by one. This implies that cutting e 01 , e a2 , e p , and e p reduces 
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e(Ti, T2, F2) by at least 1. 



Theorem 1. Let F\ and F2 be forests of rooted X -trees T\ and T2, respectively, and assume F\ 
consists of a tree T\ and a set of components that exist in F2. Let {a\, 02, ... , a m } be a sibling group 
of T\ such that either a\, 02, . . . , a r share a minimal LCA I in F2 or 02, 03, . . . , a r share a minimal 
LCA I in F2 and a\ oo F2 ai, for all 2 < i < m; a\ is not a child of I; 02 is not a child of I unless 
r = 2; ai and aj are not siblings in F2, for all 1 < i < j < m; and F 2 l is not a component of F2, 
for all 1 < i < m. Then 

(i) e(T 1 ,T 2 ,F 2 - {e x }) = e(T 1 ,T 2 ,F 2 ) - I, for some x E {ai, a 2 , B 1 , B 2 }. 

(ii) e(T 1 ,T 2 ,F 2 - {e ai ,e a2 ,e Pai ,e Pa2 }) < e(T 1 ,T 2 ,F 2 ) - 1. 

Proof, (ii) follows immediately from (i) because cutting {e ai , e a2 , e p , e Pa2 } is equivalent to cutting 
{e<ii j e a2 , e Bl , e B2 } . For (i) , it suffices to prove that there exist a binary resolution F of F2 and an edge 
set E of size e(T\,T2, F2) such that F 4- E is an MAF of T\ and T2 and E n {e ai , e a2 , e Bl , e B2 } / 0- 

So assume F 4- E is an MAF of Ti and T2 and E n {e ai , e a2 , e Bl , gb 2 } = 0- By Lemma [2j F 4- E 
is also an MAF of F\ and F2. We prove that we can replace some edge / E E with an edge in 
{e ai , e a2 , e Bl , e B2 } without changing F 4- E. 

First assume ^f~e x, for all leaves b^ E F 2 Bl and x ^ F 2 l . By Lemma [3J expanding 
B\ does not change e(-Fi, -F2, -^2)5 so we can assume F contains this expansion^] Now we choose 
an arbitrary leaf b'-y E B\ and the first edge / E E on the path from p Bl to &£. By Lemma [TJ 
F 4- E = F 4- (-B \ {/} U {e Bl }). If b' 2 oo F ^ E x , for all leaves b' 2 E Fjf 2 and x £ F 2 2 , a[ oo F ^ E Pai} 
for all leaves a'^ E F 2 1 , or a' 2 ^ £ p a2 , for all leaves a' 2 E F 2 2 , then the same argument shows that 
F J ^E = F J ^(E\ {/} U {e x }), for x = B2, x = a\, and x = 02, respectively. Thus, we can assume 
there exist leaves a' x E -F 2 1 , a' 2 E i 7 ^ 2 , b[ E i 7 ^ 1 , and b' 2 E Frf 2 such that a' x ~f-=-_b Pai ~f-=-_b &'i and 

a 2 ~F~E Pa 2 ~F+E b 2 . 

Now recall that either 01,02, ... ,a r share the minimal LCA I and ai is not a child of Z or 
02,03, ... ,a r share the minimal LCA I and ai p 2 ai, for all 2 < i < m. In either case, Oi ^ F 2 1 , 
for all 1 < i < m. Since Oj also is not an ancestor of this shows that 6^ ^ -F 2 % for all 
1 < i < m. Thus, F% contains the triple a^a'^b^, while F2 contains the triple a^b'^a^ or a[ oop 2 o! 2 . 
By Observation [TJ this implies that a\ ^f^e Pa x ^f~e b[ ^f~e a'2 ^f+e Pa 2 ^f~e b' 2 and, hence, 
b 2 ^ F 2 X . Since 02 is not a child of I unless r = 2, we also have b 2 ^ F£* , for all 2 < i < m. Thus, 
F\ also contains the triple a^a^l^, which implies that the components of F 4- E containing a\, b\ 
and a! 2 , b' 2 overlap in F\, a contradiction. □ 

Theorem [T] covers every case where some minimal LCA I exists. If there is no such minimal 
LCA, then each Oj must be in a separate component of F2. In the following lemma we show the 
stronger result that cutting 

6<n or e a2 reduces e(T\,T2, F2) by one in this case (which immediately 
implies that claims (i) and (ii) of Theorem [T] also hold in this case). 

Lemma 5 (Isolated Siblings). If a\ oop 2 Oj, for all i 7^ 1, 02 r >°F 2 a j> f or 3 2, and F 2 ' is not a 
component of F2, for all 1 < i < m, then there exist a resolution F of F2 and an edge set E of size 
e(T\,T2, F2) such that F 4- E is an AF of T\ and T2 and E n {e ai , e a2 } ^ 0. 

Proof. Consider an edge set E of size e(Ti, T2, -F2) and such that F 4- E is an AF of F% and F2, and 
assume E is chosen so that \E n {e ai , e a2 }\ is maximized. Assume for the sake of contradiction that 

2 In fact, using the same ideas as in the proof of Lemma [3J it is not difficult to see that this expansion never 
precludes obtaining the same forest F 4- E by cutting a different set of \E\ edges. We discuss the importance of this to 
hybridization and reticulate analysis in Section [5j 
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e ai! £0,2} — ^- Then, by the same arguments as in the proof of Theorem [TJ there exist leaves 
a i ^ an< i a<2 e F 2 2 such that a[ ^f^e «i and a' 2 ~fh-£ °2- Since {at, 02, ... , a m } is a sibling 
group of Fi but a\ oo F2 Oj, for all i ^ 1, and 02 ^j; for all j 7^ 2, we must have a\ <yo F _ E x, for 
all leaves x ^ i^ 1 , or a! 2 i°f-e x , for all leaves x ^ i*? 2- W.l.o.g. assume the former. Since F 2 X is 
not a component of F 2 , there exists a leaf y ^ i^ 1 such that a± ~i? 2 y and, hence, ~f 2 y. For 
each such leaf y, the path from a' t to y in F contains an edge in E because v°f+e U, and this 
edge does not belong to F 2 X because a[ ~_f-=-_e 0,1 • We pick an arbitrary such leaf y, and let / be the 
first edge in E on the path from to y. The edges e ai and / satisfy the conditions of Lemma [TJ 
that is, F J ^E = F J ^(E\ {/} U {e ai }). This contradicts the choice of E. □ 

Theorem [TJ and Lemma [5] are all that is needed to obtain a linear-time 4-approximation algorithm 
and an FPT algorithm with running time 0(4 fc n) for computing rooted MAFs, an observation 
made independently in [l7j . To improve on this in our algorithms in Section [4[ we exploit a useful 
observation from the proof of Theorem [TJ if there exists an MAF F 4- E and leaves a\ £ F 2 1 and 
b\ € F 2 X such that a\ ~_f-^_e b\, then a' 2 "°f+e b' 2 , for all a 2 G -Fg 2 and 63 6 -F^ 2 - This implies that, 
if we choose to cut e a2 or e# 2 and keep both e ai and es 1 in a branch of our FPT algorithm, then we 
need only decide which edge, e aj or es - , to cut in each pair {e aj , es-}, for 3 < j < r, in subsequent 
steps of this branch. This allows us to follow each 4-way branch in the algorithm (where we decide 
whether to cut e 0l , e^, e a2 or es 2 ) by a series of 2- way branches. We cannot use this idea when 
a sibling group consists of only two nodes. The following lemma addresses this case. Its proof is 



provided in Section S6.2 of the supplementary material. 



Lemma 6. Let T\ and T 2 be rooted X -trees, and let F\ be a forest of T\ and F 2 a forest of ' T 2 . 
Suppose F\ consists of a tree T% and a set of components that exist in F 2 . Let {01,02} be a sibling 
group of T\ such that neither F^ 1 nor F 2 2 is a component of F 2 and, if a\ and a 2 share a minimal 
LCA I, then a\ is not a child of I. In particular, a\ and a 2 are not siblings in F 2 . Then 

(i) e(T 1 ,T 2 ,F 2 - {e x }) = e(Tx,T 2 ,F 2 ) - I, for some x £ {ai,a 2 ,B 1 }. 

(ii) e(T 1 ,T 2 ,F 2 - {e ai ,e a2 ,e Pai }) < e(T u T 2l F 2 ) - 1. 

Next we examine the structure of a sibling group more closely as a basis for a refined analysis 
that leads to our final FPT algorithm with running time 0(2.42 fc n). First we require the notion 
of pendant subtrees that we will be able to cut in unison. Let ai,a 2 , . . . ,a r be the members of a 
sibling group {a\,a 2 , . . . , a m } that share a minimal LCA I in F 2 , and consider the path from a* to I, 
for any 1 < i < r. Let xi, X2, ■ ■ ■ ,x Si be the nodes on this path, excluding a, and I. For each Xj, let 

B- ■ 

Bij be the set of children of Xj , excluding the child that is an ancestor of , and let F 2 13 be the 

h B B 

subforest of F 2 consisting of all subtrees F 2 , b G B^. Note that Bn = B{, and F 2 11 = F 2 % if Sj > 0. 
Analogously to the definition of e^, we use e^ , for 1 < j < Si, to denote the edge introduced by 
expanding the nodes in B^ in F 2 and F 2 — {eB 4j } to denote the forest obtained by expanding By 

B' ■ 

and cutting edge e g.. . Note that expanding B^ turns the forest F 2 1,3 into a single pendant subtree 
attached to Xj. We distinguish five cases for the structure of the subtree of F 2 induced by the paths 
between ai,a 2 , . . . , a r and I: 

Isolated Siblings: at r >°F 2 a ii f° r all i 7^ 1, and a 2 oop 2 aj, for all j 7^ 2. 

At Most One Pendant Subtree: ai,a 2 , . . . ,a r share a minimal LCA I in F 2 , Sj = 1, for all 

1 < i < r, and a r is a child of I. 

One Pendant Subtree: a\, a 2 , . . . , a r share a minimal LCA / in F 2 and Sj = 1, for all 1 < i < r. 



S 



Multiple Pendant Subtrees, m = 2: a\ and 02 share a minimal LCA / in F2 and si + S2 > 2. 

Multiple Pendant Subtrees, m > 2: oj_, 02, ■ • • ,« r share a minimal LCA I in F2 and s\ > 2. 

Since we assume F^ 4 is not a component of F%, for all 1 < i < m, and no two nodes a% and aj in 
the sibling group are siblings in F2, we need only consider cases where at most one aj is a child 
of some minimal LCA I, and we always label it a r . Hence, Sj > 0, for all % < r such that a* L 
Thus, the five cases above cover every possible configuration of a sibling group where we must cut 
an edge of F2. 

The following four lemmas provide stronger statements than Theorem [T] about subsets of edges 
of a resolution F of F2 that need to be cut in each of the last four cases above in order to make 
progress towards an AF of T\ and T<i- All four lemmas consider a sibling group {ai,a2, . . ■ ,a m } of 
T\ as in Theorem [l] and assume F 2 ' is not a component of F2, for all 1 < % < m. Lemma [5] above 
covers the first of the five cases. 

Lemma 7 (At Most One Pendant Subtree). // 01,02, . . . ,a r share a minimal LCA I in F2, Sj = 1, 
for 1 < i < r, and a r is a child of I, then there exist a binary resolution F of F2 and an edge set E 
of size e(Ti, T2, F2) such that F 4- E is an AF ofT\ and T2 and either {e Bl , e B2 , ■ ■ ■ , &B r -i\ ^ E or 
{e Bl , es 2 , • . . , es^_ 1 , e Bi+1 , e Bi+2 , • • • , &B r -i} Q E and a • ~f^e b\, for some \ <%<r- \ and two 
leaves a\ E F 2 * and E F 2 * ■ 

Proof. Let F be a binary resolution of F2, and E an edge set of size e(T%, T2, F2) such that F4-F is an 
AF of Fx and F2 , and assume F and E are chosen so that | E n {e ai , e a2 , . . . , e ar , , e_B 2 , . . . , e s r _ 1 } | 
is maximized. Let / := {i \ 1 < i < r — 1 and F n {e ai ,e^} 7^ 0} and /':={? ; | 1 < i < 
r — 1 and e Qi E F}. 

First observe that 1 1\ > r — 2. Otherwise there would exist two indices l<2<j<r— 1 such 
that £7 n {e^, e^j, e aj , es^.} = 0. By the choice of F and Lemma [TJ this would imply that there 

exist leaves a\ E F 2 \ b\ E F 2 \ a'j E F 2 j , and E F 2 Sj such that ^f~e b[ and ^f~e b'j. If 
^f+e a'j, then a^|a^- would be a triple of F 4 F incompatible with Fx- If a\ ^f^e a'j, the 
components of F 4 £7 containing and would overlap in Fx. In both cases, F 4 F would not be 
an AF of F\ and F2, a contradiction. 

Since |/| > r — 2 and i £ I implies that there are two leaves a\ E F 2 ' and b\ E F 2 Bi such that 
^f^e fy, the lemma follows if we can show that there exist a resolution F' of F2 and an edge set 
E' of size I £7' I < \E\ such that (i) F' 4 £" is an AF of Fx and F 2 , (ii) {e Bl \i E 1} C E' , and (iii) 
for any 1 < i < r — 1, there exist leaves E Fg 1 and ^ E F 2 Sl such that ^f'+E' if and only if 
there exist leaves a" E F 2 % and 6" E F 2 % such that a'/ ~f-^_e 6j'< 

Let Y" be the set of leaves in all trees F 2 \ i E F U {r}, and F 2 B,; , i E F, let Z := X \ Y U {a' r }, 
for an arbitrary leaf a' r E F 2 r , let Z' be the LCA in F of all nodes in Y, and let E" be the set of 
edges in F that belong to the paths between I' and the nodes in {aj | i E F U {r}}. Since e fli E F, 
for all i E F, we have |F"| > |F|. We construct F' from F2 by resolving every node set Fj where 
j 6 I', resolving the set {a r } U {p ai | i E F}, and resolving all remaining multifurcations so that 
F\Y = F'\Y and F\Z = F'\Z. We define the set E' to be E' := E\E"u{e B . \ i € F} if |F"| = |F|; 
otherwise E' := E\ E" L> {e Bi \ i E F} U {e^}, where i" is the LCA in F' of all nodes in Y. It is 
easily verified that F 1 and E' satisfy properties (ii) and (iii) and that |F'| < |F|. Thus, it remains 
to prove that F' 4 E' is an AF of Fx and F2 . 

Any triple of F' 4- F' incompatible with Fx has to involve exactly one leaf a\ E F 2 1 , for some 
i E F, because any other triple exists either in F 4- F or in F\ and, thus, is compatible with Fx- 
Thus, any triple a'Axy or a'^y of F' 4 E' incompatible with Fx must satisfy x,y £ (F') 1 because 
e Bi E F', for all i E F. If |F"| > |F|, no such triple exists because e;» E F'. If |F"| = |F|, observe 
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that x,y $l (F'Y implies that a' r \xy or a' r x\y is also a triple of F' -f- E' incompatible with F\. By the 
construction of F' , this triple is also a triple of F, and since E\E' = {e a . | i G I'} and x,y F 2 \ 
for all i G F, it is also a triple of F -j- F incompatible with Fi, a contradiction. 

If two components of F' -f- F' overlap in Fi, let u,v,x,y be four leaves such that u ^p'^E' 
v oo F ,^ E , x ^f'^e' V and the two paths P uv and P xy between u and v and between x and y, 
respectively, share an edge. Let Y' be the set of leaves in the subtrees F 2 \ i G F U {r}, and assume 
P uv has no fewer endpoints in Y' than F^. 

If both endpoints of P uv are in Y' , the two paths cannot overlap because 01,02, ...,a r are 
siblings in Fi and our choices of F and E' ensure that all leaves in the same subtree F 2 1 belong to 
the same component of F' -j- E' . 

If both P uv and P xy have one leaf in Y' , their corresponding paths in F' -j- F' include Thus, 

If neither P uv nor F x?/ has an endpoint in Y', then u ^f^-e v and x ~ f^e y- If F u „ has one 
endpoint in Y', say u G 1"', and F^ does not have an endpoint in Y' , then x ^f~e y and, by the 
same arguments we used to show that no triple of F' -j- E' is incompatible with Fi , there exists a leaf 
a' r G F^ such that a' r ~f+e v and the path from a' r to -u in Fi overlaps F^. Thus, in both cases we 
have two paths P u / V , v! G {u, a' r }, and F xy in Fi such that u' ~f+e v and x ^f^e y and the two 
paths overlap. Since no two components of F -j- F overlap in Fl, these two paths belong to the same 
component of F -j- F and w.l.o.g. form the quartet u'a;|i% while v! ^p'+E' v ^F'+E' x ^F'^E' V- 
Since E' \ E C {e^ | i G F} U {e///}, we either have x,y G F^ 1 and u',f ^ F^ 1 U F 2 \ for some 
i G F, or G {F'Y' , u',v ^ F^% for all i G F, and x,y ^ (F 1 ) 1 " . In the former case, these four 
leaves form the quartet u'v\xy in F -f- F, a contradiction. In the latter case, we have u', v G Y 7 , and 
we already argued that this case is impossible. □ 

Lemma 8 (One Pendant Subtree). If a\, a 2 , . . . , a r share a minimal LCA I in F2, m > 2, and 
Si = 1, for all 1 < i < r, then there exist a resolution F of F2 and an edge set E of size e(Ti, T2, F2) 
such that F^rE is an AF ofT\ and T2 and either {e ai , e a2 , . . . , e ar } C E, {e Bl , e_B 2 , . . . , e Br } C F or 
there exists an index 1 < i < r and two leaves a\ G F 2 ' and b\ G F 2 ' such that a\ ~ p+ E K and either 
{e ai ,e 02 , . . . ^a^ea^e^,. . . ,e ar } C E or {e Bl ,e Bi ,. ■ ■ , e Bi _ x , e Bi+1 , e Bi+2 , ■ ■ -,e Br } C F. 

Proof. Let F be a resolution of F 2 , and F an edge set of size e(Ti, T2, F2) such that F 4- F is an 
AF of Fi and F2, and assume F and F are chosen so that |F n {e ai , e a2 , . . . , e Qr , , e B2 , . . . , e Br }\ 
is maximized. 

If there exists an index 1 < j < r such that e B G F, then assume w.l.o.g. that j = r. The 
forests Fi and F 2 := F 2 4- {e^,.} satisfy the conditions of Lemma [7j Hence, there exist a resolution 
F' of F' 2 and an edge set E' of size e(T u T 2 ,F" 2 ) = e(T 1 ,T 2 ,F 2 ) - 1 such that F' 4 F' is an AF 
of Fi and F 2 and, thus, of Fi and F2 and either {e Bl ,e B2 , . . . ,e Br _ 1 } C E' or a\ ^p'^E' b' { and 
{e Bl ,e B2 ,. . . , eBi_! , es i+1 , eB i+a , • • • , ee r _i} C F', for some index 1 < i < r - 1 and leaves a\ G F^ 
and 6- G F 2 B \ Thus, the resolution F" of F 2 such that F" 4 {e Br } = F' and the edge set E' U {e Bl } 
satisfy the lemma. 

If F n {e Bl , e B2 , . . . , e Br } = 0, then by the same arguments as in Lemma [7J we can have at 
most one index 1 < i < r such that a[ ~f-=-_e b' { , for two leaves G F 2 ' and h\ G F 2 l . If such 
an index i exists, then {e 0l , e a2 , . . . , e ai _ 1 , e 0i+1 , e ai+2 > • • • , e ar } C F. If no such index exists, then 
{e ai ,e a2 ,...,e ar } C F. □ 

Lemma 9 (Multiple Pendant Subtrees, m = 2). If {01,02} is a sibling group such that a\ and a 2 
share a minimal LCA I in F 2 and s\ + s 2 > 2, i/ien i/iere exist a resolution F of F 2 and an edge 
set E of size e(Ti,T 2 , F 2 ) such that F 4- F is an AF of T\ and T 2 and either E n {e ai , e a2 } 7^ or 
{e Bll , es 12 , . . . , e Blsi } U {es 21 , eB 22 , • • • > e B 2s2 } C F. 
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Proof. We prove this by induction on s = s% + S2- The base case is s = ij^Jand the claim follows 
from Lemma HI 

Having established the base case, we can assume s > 1 and the lemma holds for all 1 < s' < s. 
By Theorem [TJ there exist a resolution F of F2 and an edge set E of size e(Ti,T2, F2) such that 
F -T- F is an AF of F% and F2 and F n {e ai , e a2 ,e Bl ,e B2 } 7^ 0. Assume F and E are chosen so 
that \E n {e ai , e a2 , e Bl , es 2 }| is maximized. If E D {e ai , e a2 } 7^ 0, the lemma holds, so assume the 
contrary. If S2 = and E n {e ai ,e a2 ,e Bl ,e B2 } = {e_B 2 }, the choice of F and E and Lemma [l] imply 
that there exist leaves a[ G F^ 1 , b\ G FrP 1 , a' 2 6 F2 2 , and x ^ F 2 2 such that ~^ 2 _^{ es2 } b\ and 
a' 2 ~i7 2 -^{ efl2 } x. Thus, a\ and 02 satisfy the conditions of Lemmap^after cutting edge e B2 , which 
implies that we can choose F and E so that ECi{e ai , e a2 } / in addition to e Bi G F, a contradiction. 
Now assume S2 7^ or E n {e ai , e a2 , , eg 2 } 7^ {e_B 2 }- m this case, E n {eB u , es 21 } 7^ 0. Assume 
w.l.o.g. that e Bll G F. Then the inductive hypothesis shows that there exist a resolution F' of F2 and 
an edge set E' of size e(Ti, T2, F2) such that F'-=-F' is an AF of Fi and i^-^je^} (and, hence, of Fi 
and F 2 ) such that either F' n {e ai ,e a2 } / or {e Bll ,e Bl2 , • . • ,es lsi } U {e B21 ,e B22 , ■ ■ ■ ,e B2a2 } Q E' . 
Thus, the lemma also holds in this case. □ 

The proofs of the following two lemmas are similar to that of Lemma 9. They are provided in 
Sections S6.3| and |S6.4| of the supplementary material. 



Lemma 10 (Multiple Pendant Subtrees, m > 2 and r > 2). If a\, 02, . . . ,a r share a minimal LCA I 
in F2, m > 2, r > 2, and s\ > 2, then there exist a resolution F 0/F2 and an edge set E of size 
e(Ti, T2, F2) such that F-^F is an AF ofT\ and T2 and Fn{e ai , e a2 } / 0, {e Bll , e Bl2 , . . . ,e Blsi } C F 
or {e B21 ,e B22 ,...,e B2s2 } C F. 

Lemma 11 (Multiple Pendant Subtrees, m > 2 and r = 2). // a\, 02, . . • , a r share a mini- 
mal LCA I in F2, m > 2, r = 2 and s± > 2, t/ien i/iere exist a resolution F of F2 and an 
edge set E of size e(T\, T2, F2) such that F 4- F is an AF of T\ and T2 and E n {e ai , e a2 } 7^ 0, 
{esu.eB^, . . . ,e Blsi } Q E or {e B21 ,e B22 ,. . . ,e B2a2 ,e B ; 2 } C F, where B' 2 is the set of siblings of a t 
a/ter cMttinff e B21 , e B22 , . . . , e B2s2 ■ 

4 MAF Algorithms 

In this section, we present an FPT algorithm for computing MAFs of multifurcating rooted trees. 
This algorithm also forms the basis for a 3-approximation algorithm with running time O(nlogn), 



which is presented in Section S8 of the supplementary material. 

As is customary for FPT algorithms, we focus on the decision version of the problem: "Given 
two rooted X-trees T\ and T2 and a parameter k, is d s spn(Ti,T2) < fc?" To compute the distance 
between two trees, we start with k = and increase it until we receive an affirmative answer. This 
does not increase the running time of the algorithm by more than a constant factor, as the running 
time depends exponentially on k. 

Our FPT algorithm is recursive. Each invocation Maf(Fi, F2, k, oq) takes two (partially resolved) 
forests Fi and F2 of T\ and T2, a parameter k, and (optionally) a node oq that exists in Fi and 
F2 as inputs. F\ is the union of a tree T\ and a forest Fq disjoint from T±, while F2 is the union 
of the same forest Fq and another forest F2 with the same label set as T\. The output of the 
invocation Maf(Fi, F2, k, ao) satisfies two conditions: (i) If e(T\,T2, F2) > k, the output is "no", 
(ii) If e(Ti,T2,F2) < k and either ao = nil or there exists an MAF F of F\ and F2 such that ao 



3 We excluded this case from the statement of the lemma, in order to keep the cases covered by the different lemmas 



disjoint, but the lemma also holds for s = 1. A similar comment applies to Lemma fO 
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is not a root of F and ao a>i , for every sibling aj of ao in F\ , the output is "yes" . Since the 
top-level invocation is Maf(Ti, T2, k, nil), these two conditions ensure that this invocation decides 
whether e(Ti,T2,T2) < k. 

The representation of the input to each recursive call includes two sets of labelled nodes: 
Rd (roots-done) contains the roots of Fq, R t (roots-todo) contains the roots of (not necessarily 
maximal) subtrees that agree between T\ and F2. We refer to the nodes in these sets by their labels. 
For the top-level invocation, F\ = T\ = T\, F2 = F2 = T2, and Fq = 0; Rd is empty and Rt contains 
all leaves of T\\ ao = nil. 

Maf(jPi, F2, k, ao) uses the results from Section [3] to identify a small collection {E\, E2, ■ ■ ■ , E q } 
of subsets of edges of F2 such that e(Ti,T2, F2) < k only if e(Ti,T2, F2 -j- Ei) < k — \Ei\, for at 
least one 1 < i < q. It calls Maf(Fi, F2 4- Ei, k — \Ei\, a'J recursively, for each subset Ei and an 
appropriate parameter a^, and returns "yes" if and only if one of these recursive calls does. 

A naive use of the structural results from Section [3] would explore many overlapping edge subsets. 
For example, one branch of the algorithm may cut an edge e ai and then an edge e aj , while a sibling 
branch may cut e aj and then e ai . As we hinted at in Section |3j if we cut edge e ai or its sibling 
edge in two sibling invocations, then there is no need to consider cutting either of these two 
edges in their sibling invocations or their descendants. Using the results of Lemmas EHH1 we 
obtain more generally: if we cut e ai or its set of progressive siblings {e_B a , ee i2 , . . . , es is . } in two 
sibling invocations, then we need not consider these edge sets in their sibling invocations or their 
descendants. Thus, we set ao = a% in these sibling invocations and thereby instruct the algorithm 
to ignore these edges as candidates for cutting. An invocation MAF(i ? i, F2, k, ao) with ao 7^ nil 
(Step [7] below) makes only two recursive calls when it would make significantly more recursive calls 
if ao = nil (Step [8] below). This is not a trivial change, as it is required to obtain the running time 
claimed in Theorem [2] The steps of our procedure are as follows. 

1. (Failure) If k < 0, then e(T 1 ,T 2 ,F 2 ) > > k. Return "no" in this case. 

2. (Success) If \R t \ = 1, then Fi = F 2 . Hence, F 2 is an AF of T x and T 2 , that is, e(T 1} T 2 , F 2 ) = < k. 
Return "yes" in this case. 

3. (Prune maximal agreeing subtrees) If there is no node r £ Rt that is a root in F2, proceed to 
Step |4} Otherwise choose such a node r G R t ; remove it from R t and add it to i?^, thereby 
moving the corresponding subtree of F2 to Fq; and cut the edge e r in F\. If r's parent p r in F\ 
now has only one child, contract p r . If ao 7^ nil and p r 's only child before the contraction was 
ao, set ao = nil. Note that these changes affect only F\. Thus, e(Ti,T2, F2) remains unchanged. 
Return to Step [2] 

4. Choose a sibling group {ai, 02, . . . , a m } in T\ such that 01, 02, • • • , a m G R±. If two or more 
members of the sibling group chosen in this invocation's parent invocation remain in T±, choose 
that sibling group. 

5. (Grow agreeing subtrees) While there exist indices 1 < % < j < m such that aj and dj are siblings 
in F2, do the following: Remove ai and aj from Rt; resolve ai and aj in T\ and F2; label their new 
parent in both forests with (aj, a,-) and add it to Rt- The new node (aj, aj) becomes a member 
of the current sibling group and m decreases by 1 . If m = 1 after resolving all such sibling pairs 
{aj, aj}, contract the parent of the only remaining member of the sibling group and return to 
Step [2| otherwise proceed to Step |6j 

6. If aj 00 p 2 aj, for all 1 < i < j < m, proceed to Step[7j Otherwise there exists a node I that is a 
minimal LCA of a group of nodes in the current sibling group. If the most recent minimal LCA 



12 



Case 7.1 Case 7.2 




Figure 5: The different cases of Step[7j Only the subtree of F2 rooted in I is shown. The left side 
shows a possible input for each case, the right side visualizes the cuts made in each recursive call. 
The node ao is shown as a hollow circle (if it is a descendant of I). 

chosen in an ancestor invocation is a minimal LCA of a subset of nodes in the current sibling 
group, choose I to be this node; otherwise choose I arbitrarily. Now order the nodes in the sibling 
group {ai, 02, . . . , a m } so that, for some r > 2, ai, a%, . . . , a r are descendants of I, while, for all 
1 < i < r < j < m, either the LCA of and a,- is a proper ancestor of I or Oj ^_p 2 aj. Order 
ai, ct2, . . . , a r so that s% > S2 > • • • > s r . (Recall that Sj is the number of nodes on the path from 
ai to Z, excluding ctj and L) The order of a r +i, a r +2, • • • , «m is arbitrary. 

7. (Two-way branching) If ao = nil, proceed to Step Otherwise distinguish four cases, where 
x = 1 if a\ 7^ ao, and x = 2 otherwise (see Figure [5^7 

7.1. If ai ^_p 2 aj, for all i ^ 1, and 02 "°f 2 for all j 7^ 2, call Maf(Fi,F2 -j- {e a:E }, A; — l,ao). 

7.2. If ai,a2,-..,a r share the minimal LCA Z in and ao is not a descendant of Z, call 
Maf(Fi, F2 -T- {eB u , e_B 12 , . . . , e j e l3 }, — si, ao). If si > 1 or s r > 0, also call Maf(Fi, F2 -r- 
{e ai },k - l,a ). 

7.3. If r = 2, s x = 0, ao is a descendant of Z, and either Z is a root of -F2 or its parent has a 
member aj of the current sibling group as a child, call MAF(i ? i, F2 — {eB x }, k — l,ao). 

7.4. If ai, 02, . . . , a r share the minimal LCA Z in i^, ao is a descendant of /, and Case |7.3| does not 
apply, call Maf(Fi, F2 -r- {e ax }, k — 1, ao). If m > 2 and r > 2, make another recursive call 
Maf(_Fi, -F2-^{es xl , eB x2 i ■ ■ ■ > e B xa }, k — s x , ao). If m > 2 but r = 2, the second recursive call 
is Maf(Fi,F 2 -r {e^i^Brf^-^es^^e^}, 

— (sx + 1), ao), where -B^. is the set of siblings of x after cutting eB xl , zb x2 > ■ ■ ■ -> e B XSx ■ 

Return "yes" if one of the recursive calls does; otherwise return "no". 

8. (Unconstrained branching) Distinguish seven cases and choose the first case that applies (see 
Figure [6]) : 

8.1. If ai oo F2 for all i 7^ 1, and a2 ^f 2 «?> for all j 7^ 2, call Maf(Fi,F2 {e ai },k — l,nil) 
and Maf(F 1 ,F 2 {e a2 },Zs - l,nil). 

8.2. If Sj = 1, for 1 < i < r, and a r is a child of Z, call Maf(Fi,F2 -j- {e_B 15 ee 2 , • • • , es r _ 1 }, 
fc - (r - l),nil) and Maf(Fi,F 2 H- {e Bl , e B2 , . . . , 
e^.^e^^es^, . . . , es^J, Zc - (r - 2),aj), for all 1 < i < r - 1. 
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8.3. If m = 2, and s 1 + s 2 > 2, call Maf(Fi, F 2 -t {e Ql }, k - 1, nil), Maf(Fi, F 2 -r {e a2 }, fc - 1, nil), 
and Maf(Fi , F 2 -=- {e Bll , e Bl2 , • • • , e Blai } U {e B21 , e B22 , . . . , e B2s , 2 }, k - (si + s 2 ) , nil) . 

8.4. If m > 2, r = 2, and si = s 2 = 1, call Maf(Fi,F 2 -j- {e ai , e a2 }, k - 2, nil), Maf(Fi,F 2 -f- 
{e Bl ,e B2 }, 

k - 2, nil), Maf(Fi,F 2 -=- {e Sl ,e B /},A; - 2,a 2 ), and Maf(Fi,F 2 -=- {e B , 2 , s B ' 2 }, k - 2,ai), 
where -B- is the set of siblings of cij after cutting edge If Z is a root, I's parent pi has at 
least one child that is neither / nor a member aj of the current sibling group or i's grandparent 
has at least one child that is neither pi nor a member ah of the current sibling group, make 
two additional calls Maf(Fi, F 2 -j- {e Ql }, k — 1, a 2 ) and Maf(Fi, F 2 -f- {e Q2 }, fc — 1, oi). 

8.5. If m > 2, r > 2, and Sj = 1, for all 1 < i < r, call Maf(Fi, F 2 4- {e ai , e a2 , . . . , e Qr }, — r, 
nil), Maf(Fi,F 2 -v- {eB 1 ,e B2 ,...,e Br },fc - r, 
nil), Maf(Fi,F 2 -v- {e ai ,e a2 ,...,e aj _ 1 ,e ai+1 , 
e ai+2 , • • .,e ar },k-(r-l),ai), for all 1 < i < r, and Maf(Fi, F 2 + {e Bl , es 2 , . . . , e^,^ e Bi+1 , 
eBi+2, ■ ■ -,e Br }, k-(r- l),aj), for all 1 < £ < r. 

8.6. If m > 2, r = 2, si > 2, s 2 = and either I is a root of F 2 or its parent has a member a, 
of the current sibling group as a child, call MAF(i ? i, F 2 — {e ai }, k — 1, nil), MAF(i ? i, F 2 — 
{eB 11 ,eB 12 ,...,eB lai },fc-si,nil), and Maf(Fi,F 2 -{e B2 },A;- l,nil). 

8.7. If m > 2, si > 2, and Case |8.6| does not apply, call Maf(Fi,F 2 -j- {e ai },/c — l,nil), 
MAF(F U F 2 h- {e a2 },A; - l,oi), and Maf(F u F 2 + 
{e Bll ,e Bl2 ,. . .,e Bln },k-si,ml). If r > 2, call MAF(F 1 ,F 2 + {e B21 ,e B22 , . . . ,e B2s2 },k- s 2 , 
at). If r = 2, call Maf(Fi,F 2 -v- {e fi21 , e B22 , . . . , 
e B 2s2 j e B 2 }' k — (s 2 + I), a\), where B' 2 is defined as in Lemma 
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Return "yes" if one of the recursive calls does; otherwise return "no". 

Theorem 2. Given two rooted X-trees T\ and T 2 and a parameter k, it takes 0((1 + y/2) 
0(2.42 fc n) time to decide whether e(T±,T 2 ,T 2 ) < k. 



n) 



Proof. We use the algorithm in this section, invoking it as Maf(Ti, T 2 , k, nil). We leave its correctness 
proof to a separate lemma (Lemma |12| below) and focus on bounding its running time here. As 



we argue in Section S7 of the supplementary material, each invocation M af (F±, F 2 ,k' ,oq) takes 
O(n) time. Thus, it suffices to bound the number of invocations by 0((1 + v / 2) fc )- Let I(k,t) be 
the number of invocations that are descendants of an invocation MAF(i ? i, F 2 , k, ao) in the recursion 
tree, where t = 1 if the invocation executes Step [7] but not Step [8} otherwise t = 0. We develop a 
recurrence relation for I(k,t) and use it to show that I(k,t) < (1 + v / 2) 2 + max (°' fc ~ i + 3 ) + 2{t — 1), 
which proves our claim 

An invocation with t = by definition either executes neither Step [7] nor Step |8j or it executes 
Step [8| By considering the different cases of Step |8l we obtain the following recurrence for the case 



4 This is a fairly loose bound on I(k,t), but it is easy to manipulate. 
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when t = 0: 



l(k,0) < 



1 




no recursion 


1 + 2I(k 


-1,0) 


CaseO 


l + I(k- 


1,0) +I(k,l) 


CaseEJ 


1 + 2I(k 


-1,0) + I(k -2,0) 


Cases 8.3, 8.6 


1 + 2I(k 


-2,0) + 2J(fc-l,l) 




+ 2I(k 


-2,1) 


CaseEJ 


1 + 2I(k 


- 3, 0) + 3I(A; - 2, 0) 




+ 31{k 


-2,1) 


Case 1831 


l + I(k- 


1,0) + /(A; -2,0) 




+ 2I(k 


-1,1) 


Case EH 



The values of the first arguments of all /( 
For Case 18.21 we observe that r 



> 



terms are easily verified for Cases 8.1, 8.3 8.4 and 8.6 
_ 2 and the worst case arises when r = 2, giving the claimed 
recurrence. For Case 8.7 we observe again that r > 2. If r > 2, then s 2 > 0, giving the recurrence 
for this case. If r = 2, then s 2 may be 0, but the fourth recursive call cuts s 2 + 1 edges, thereby 
giving the same recurrence as when r > 2. For Case 8.5 finally, we have r > 3 and, once again, the 
minimum value, r = 3, is the worst case, which gives the recurrence. 

Next we argue about the correctness of all second arguments that are 1 in these recurrences. 
Each such term /(-, 1) corresponds to a recursive call with ao / nil. Thus, in order to justify setting 
t = 1, we need to show that each such invocation executes Step [7] but not Step [8] which follows if in 
this child invocation, the current invocation's sibling group exists and at least one additional edge 
cut in i*2 is required to makes this sibling group agree between F\ and -F 2 — we say that the sibling 
group agrees between F\ and F2 if F2 does not contain a triple incompatible with F\ and involving 
descendants of at least two members of the sibling group, and there are no two paths between leaves 
in F2 that (i) belong to different components of F2, (ii) overlap in F\, and (iii) have at least one 
endpoint each that is a descendant of a member of the current sibling group. The only cases that 

We consider each case in turn. 



make recursive calls with ao 7^ nil are Cases 8.2, 8.4, 8.5 and 8.7 



In Case 8.2, consider a recursive call that cuts edges &B\i^B%i ■ ■ • , e B i _ 1 , GB i+1 , cb i+2 , • 
Assume w.l.o.g. that i = 1. After cutting edges eB 2 , es 3 , ■ ■ ■ , 

and a' r G F 2 T such that F% contains the triple a^a'^b^ while F2 



there 



for some 1 < i < r — 1 
still exist leaves a' x £ F^ , «i ± 2 
contains the triple a^^ja',. Thus, the sibling group does not agree between F\ and F2 yet. 

In Case 8.4, if we make only four recursive calls, we can ignore whether setting ao = a\ or 
a o = 0-2 in the third and fourth recursive calls translates into t = 1 for these recursive calls 
because even the recurrence I(k, 0) = 1 + AI{k — 2, 0) is bounded by the recurrence I(k, 0) = 
1 + 21 (k — 2,0) + 21 (k — 2,1) + 21 (k — 1,1) for the case when we make six recursive calls. If we make 
six recursive calls, there exists a member 03 of the current sibling group that is not a descendant 
of /. Thus, the conditions for making the fifth and sixth recursive calls imply that there exist leaves 
a 3 G F% z and b' 3 ^ F^ 1 , for all 1 < i < m, such that a 3 6 3 and the path from a' 3 to b' 3 in F2 is 
disjoint from F%. After cutting es 1 and e^, there exist leaves a' 2 £ F 2 2 and b' 2 G F 2 Bi such that 
a 2 r ^F 2 ~{e Bl ,e B ,} b' 2 - Thus, we have two paths in different connected components (between a 2 and 

b' 2 and between a 3 and b' 3 ) that overlap in F\, and the sibling group does not agree between F\ and 
F2 yet. After cutting e ai , we obtain overlapping paths as above if a 2 00 p 2 a 3 ; otherwise a' 2 b' 2 \a' 3 is a 
triple of F2 incompatible with F±. Thus, once again the sibling group does not agree between F\ 
and F2. Similar arguments show that setting t = 1 is correct when cutting edge e Q2 or edges es 2 
and e B > . 



15 



Case 8.1 



iT 2 ai A T 2 a T A a2 



Case 8.2 



ai A a2 A A a3 




".; aid a2 /\ ai 




Case 8.3 




a,2 ai 



l A 




a 2 a ^ ^ 2 ai 




Case 8.4 



A 



a A A* 2 "A A 12 ' t? '"'' : " lf 





(12 



A 

a X 1 



a2 ai 




Case 8.5 





03 ai» a2« «a3 ai 



A a A A a: 





a.3 ai« a-2* *an 



A A a: 




Case 8.6 




A 




a,2 ait 




0,2 a\ 





(12 



Case 8.7 





a 3 aidLAa 2 




Figure 6: The different cases of Step|8j Only the subtree of rooted in I is shown. The left side 
shows a possible input for each case, the right side visualizes the cuts made in each recursive call. 



Whenever ao 7^ nil in a recursive call, it is shown as a hollow circle. The last two calls in Case 8.4 
may or may not be made. 
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In Case 8.5, we claim that it is correct to set t = 1 for each recursive call that cuts edges 
esi, ee 2 , . . . , e^^, 

e Bi+D e Bj+2> • • • ! e S r > for some 1 < £ < r. Assume w.l.o.g. that i = 1. After cutting edges 
e_B 2 i e -B3> • • • > e B r , there exist leaves G i^ 1 , 6' x G Xrf 1 , and a 2 G F 2 2 such that a^a^b^ is a triple 
of F\ and a^^a^ is a triple of F 2 . Thus, the sibling group does not agree between F\ and F 2 yet. 



In Case 8.7 observe that / is not a root and its parent does not have a member of the current 
sibling group as a child. Thus, since m > 2, there exists an index j and two leaves a'- G F 2 J and 
&j ^ F 2 ' 1 , for all 1 < h < m, such that a^- ~i? 2 6j and the path from a'j to bj in i*2 is disjoint from 
the path between any two leaves in F^ 1 and F^ 11 - After cutting e a2 , there exist leaves a[ G -F^ 1 and 
b'i G -Ff 11 such that a[ ~p 2 ^| ea2 } If a[ ~i? 2 a^-, we thus have a triple a^&'jjc^- of F 2 incompatible 
with F\ . If a'i oo p 2 a'j , the path between a' x and 6' x overlaps the path between a'j and 6j- in F\ . In 
either case, the current sibling group does not agree between F\ and F 2 yet. This concludes the 
correctness proof of the recurrence for I(k, 0). 

For t = 1, we distinguish whether or not the current invocation X makes a recursive call with 
t = and whether it makes one or two recursive calls. If X makes no recursive call with t = 0, we 
obtain I(k, 1) < 1 + 2I(k — 1, 1) because each case of Step [7] makes at most two recursive calls, 
with parameters no greater than k — 1. If X makes only one recursive call, with t = 0, we obtain 
1) < 1 + I(k — 1, 0) because this recursive call has parameter no greater than k—1. Finally, if X 



makes two recursive calls, at least one of them with t = 0, X must have applied Case 7.2 or 7.4 Let 
X' be one of the invocations X makes with t = 0. If t = for invocation X' because X' terminates in 
Step 1 or 2, we obtain I(k, 1) < 2 + I(k — 1, 0) by counting invocations X and X' and the number of 
recursive calls spawned by the sibling invocation of X', which cannot be more than I{k — 1,0). So 
assume that t = for invocation X' and that X' does make further recursive calls. Then the sibling 
group chosen in invocation X must agree between the input forests of invocation X'. 



If invocation X applies Case 7.2 and makes two recursive calls, we observe that m > 3 be- 
cause ao is a member of X's sibling group, ao is not a descendant of I, and I has at least two 
descendants in the sibling group. Furthermore, si > 0. Thus, after cutting e ai , has a sib- 
ling forest B' 2 that does not include ao- Since ao also has a sibling forest Bq that does not 
include 02, X's sibling group cannot agree between T\ and F 2 after cutting e ai . This implies that 
t = 1 for the first recursive call Maf(Xi, F 2 -j- {e ai }, fc — 1, ao), and X' is the second recursive call 
Maf(Fi, X 2 -j- {e_B n , e# 12 , . . . , es ls }, k — 3%, ao). This gives the recurrence 1) = 1 + J(fc — 1,1) + 
/(/c — si, 0). Since no two members of X's sibling group are siblings in X 2 and ao is not a descendant 
of I, cutting edges eB u , es 12 , . . . , es ls can make X's sibling group agree between F\ and F 2 only if 
r = 2, S2 = 0, and either I is a root of X 2 or the only pendant nodes of the path from I to the root of 
its component in F 2 are members of X's sibling group. Thus, since we assume we make two recursive 
calls, we must have s\ > 2, that is, the recurrence for this case is I(k, 1) < 1 + I{k— 1, 1) + I(k — 2, 0). 



Finally, if invocation X applies Case 7.4, observe that, since ao is a descendant of I and has a 
group of sibling trees Bq that do not contain any member aj of X's sibling group, this sibling group 
can be made to agree between F\ and F2 only by cutting e ax ■ Moreover, since no member a, of X's 
sibling group is a root of F 2 , cutting e ax can make this sibling group agree between F\ and F 2 only 



if m = 2. Thus, Case 7.4 makes only one recursive call and we obtain I(k, 1) = 1 + I(k — 1, 0) in 
this case. 

By combining the different possibilities for the case when t = 1, we obtain the recurrence 

I(k, 1) < max(l + 2I(k — 1, 1), 2 + I(k - 1, 0), 
1 + I(k- 1,1) + I(k - 2,0)). 

Simple substitution now shows that I(k, t) < (1 + ^/2)2+max(o,fc-t+3) + 2 {t - 1). □ 
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Lemma 12. For two rooted X-trees T\ and T2 and a parameter k, the invocation Maf(Ti, T 2 , k, nil) 
returns "yes" if and only if d s spR(Tx,T 2 ) = e{Tx,T2,T2) < k. 

Proof. We use induction on k to prove the following two claims, which together imply the lemma: 
(i) If e(T 1 ,T 2 ,F 2 ) > k, the invocation Maf(Fi, F 2 , k, a ) returns "no", (ii) If e(T h T 2 , F 2 ) < k and 
either oq = nil or there exists an MAF F of Fx and F2 such that ao is not a root of F and a$ 00 F a^, 
for every sibling 04 of ao in Fx, the invocation MAF(i*\, F 2 , k, ao) returns "yes". 

(i) Assume e(Tx,T 2 , F2) > k. If k < 0, the invocation returns "no" in Step [I] If k > 0, assume 
for the sake of contradiction that the invocation returns "yes" . If it does so in Step [2j then F2 
is an AF of T\ and T2, that is, e(Ti,T2, F2) = < k, a contradiction. Otherwise it returns "yes" 
in Step[7]or[8| Thus, there exists a child invocation Maf(F{, F 2 , k' , a' ) that returns "yes", where 
F 2 = F2 -7- E and k' = k — \E\, for some non-empty edge set E. By the inductive hypothesis, we 
therefore have e(Tx,T2, F 2 ) < k' and, hence, e(Tx,T2, F2) < k' + \E\ = k, again a contradiction. 

(ii) Assume e(Ti, T2, F 2 ) < k and either ao = nil or there exists an MAF F of Fx and F2 such 
that ao is not a root of F and ao v°f o,i, for every sibling a« of ao in Fx. In particular, /c > and 
the invocation Maf(Fi, F2, k, ao) produces its answer in Step [2j [7] or [8j If it produces its answer in 
Step [2} it answers "yes" . Next we consider Steps [7] and [8] and prove that at least one of the recursive 
calls made in each case returns "yes" , which implies that the current invocation returns "yes" . 

In Step[7j ao 7^ nil, that is, ao is not a root of F and ao 00 F en, for every sibling a» of ao in Fx. In 
Case |7.l| a x 00 p 2 ai and, hence, a x 00 F en, for all i ^ x. Thus, a x is a root of F because otherwise the 
components of F containing a x and ao would overlap in F\. This implies that there exists an edge 
set E such that e ax E E and F = F 2 ^r E, that is, the recursive call Maf(Fi, F2 -j- {e ax }, k — 1, ao) 
returns "ye s" . 

In Case 7.2 a' x 00 F b^, for all leaves a[ E F 2 X and b± E F 2 lj , 1 < j < sx, because the path 
between a\ and b\ would overlap the component containing ao- Thus, there exists an edge set E 
such that F2^r E = F and either e ai E E or {e^ , ee 12 , . . . , es ls } Q E. If we make two recursive 



calls in Case 7.2, this shows that one of them returns "yes". If we make only one recursive call, 
we have sx = 1 and s r = 0. Assume this recursive call returns "no". Then e ai E E. Let F' be 
the forest obtained by cutting eB 1 instead of e ai , resolving the pair {ai,a r }, cutting edge eu lia \ 
instead of e Qr if e ar E E, and otherwise cutting the same edges as in E. F and F' are identical, 
except that in F , ax and a r are siblings and no leaf in F 2 1 can reach a leaf not in F 2 1 . The former 
cannot introduce any triples incompatible with Fx because ax and a r are siblings in Fx- The latter 
cannot introduce any overlapping components because this would imply that a\ ~p b\, for two 
leaves a'i E F^ 1 and b^ E F 2 Bl , and we already argued that no such path can exist. Thus, F' is also 
an MAF of Fx and F 2 . Finally, ao is not a root in F' and ao ^°f' a i because otherwise ao ~f &r- 
Thus, since there exists an edge set E 1 such that F' = F 2 -t- E 1 and es 1 E E' , the recursive call 
Maf(Fi, F2 -r {e^}, k — 1, ao) returns "yes", a contradiction. 



In Case 7.3 there exists an edge set E such that F = F2 -r- E and E n {e ax ,&b x } / 0- Otherwise 
we would have a x ~p ao or a' x ~f b' x , for two leaves a' x E F 2 X and b' x £ F 2 \ for all 1 < i < m, but 
we have a x 00 F ao and the path between a' x and b' x would overlap the component of F that contains 
ao- Now, if es x £ E, we construct an MAF F' of Fx and F 2 such that ao is not a root in F' and 



ao o°f' a>i, for all 1 < i < m, and an edge set E' such that F' = F 2 j t- E' and es x E E' as in Case 7.2 
Thus, the invocation Maf(Fi, F2 -r- {es x }, k — 1, ao) returns "yes". 



In Case 7.4, finally, one of the two recursive calls MAF(i ? i, F2 -j- {e ax }, k — 1, ao) or Maf(Fx, F^ 



2 ' 



{ e B x i > e B x2 j • ■ • j e Ba; Sa: }; k — s x , ao) must return "yes" , by the same arguments as in Case 7.2 Thus, if 



m > 2 and r > 2, one of the recursive calls we make returns "yes" . If m > 2 and r = 2, we observe that 



after cutting edges es xl , eB x2 > • • • i e B XSx , a x and ao satisfy the conditions of Case 7.3 which shows that 
we can cut edge B' x immediately after cutting these edges. Thus, if the recursive call Maf(Fi,F2 
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{e ax },k-l,ao) returns "no", the recursive call Maf(Fi, F 2 + {e Bxl , e Bx2 , ■ ■ ■ , e Bx3x , e B ' x }, k - s x , a ) 
must return "yes" in this case. Finally, if m = 2, since a x oo F ao and any path between two leaves 
a' x G F 2 X and b' x ^ F% x U F 2 ° would overlap the component of F that contains ao, a x is a root in F. 
Thus, the invocation Maf(Fl, F2 4 {e a;c }, fc — 1, ao) returns "yes" in this case. 

Now consider Step [8j In Cases |8.1[ |8.2[ |8.3[ and 8J5 , Lemmas [5j [7j |9j [8] and the inductive 



hypothesis show that one of the recursive calls returns "yes" and, thus, the current invocation 
returns "yes". 

In Case |8.4[ Lemma [8] and the inductive hypothesis show that one of the recursive calls 
MAF(F 1 ,F 2 -{e ai ,e a2 },/c-2,nil), Maf(Fi, F 2 4-{e Sl , e B2 }, k-2, nil), Maf(F X) F 2 +{e Bl }, k-1, a 2 ), 
Maf(Fi,F 2 4 {e B2 },k - 1, 

ai), Maf(Fi,F 2 4 {e ai }, k - l,a 2 ) or Maf(Fi,F 2 4 {e a2 },k - l,a{) would return "yes". Thus, we 
need to argue only that we can cut both e Bi and e B i in the third and fourth recursive calls, and 
that the last two recursive calls are not necessary when we do not make them. 

First consider cutting e Bl and setting ao = a 2 in the third recursive call. We require this call 
to return "yes" only if the other calls return "no". The forest F 2 = F 2 4 {e Bl } contains a triple 
a il a 2^2> w here a' x G F^ 1 , a' 2 G F 2 2 , and b 2 G Frf 2 , while Fi contains the triple a^a^b^. Thus, in 
order to obtain an MAF of Ft and F 2 where a 2 exists and is not a root, we need to cut either e ai or 
e B i . Since the first and fifth recursive calls return "no", however, we know that cutting 01 cannot 
lead to an MAF of F\ and F 2 , and we can cut e B i along with edge e Bl . The case when we cut e B ^ 
along with edge e B2 is analogous. 

If we do not make the recursive call Maf(Fi,F 2 4- {e ai },k — l,a 2 ), then I has exactly one 
sibling, ai, and its parent pi is either a root or has exactly one sibling, a,j. Thus, the forest 
F' 2 = F 2 4- {e ai } contains a triple a^b'^a^, where a 2 G F 2 2 , b 2 G F 2 Bi , and a'j G i 7 ^, while Fi 
contains the triple a 2 a^|6 2 . In order to obtain an MAF of F\ and Fg where a 2 exists and is not a 
root, wg therefore need to cut either c a - or e Bi = e\. If pi is a root, cutting either edge has the 
same effect. If pi has a sibling aj and we cut e a . , we can obtain an alternate MAF by cutting 
e\ instead of e ai , resolving {ai,aj}, cutting edge e( ai ,aj) instead of e aj if e a . G F, and otherwise 
cutting the same edges as in F. Thus, we can always replace the recursive call Maf(Fi,F 2 4 
{e ai }, k — 1, a 2 ) with the call Maf(Fi, F 2 4- {e ai , e;}, k — 2, a 2 ) without affecting the correctness of 
the algorithm. If this call returns "yes", however, then so does the call Maf(Fi,F 2 4 {e Bl ,e B / i }, 
k — 2, a 2 ) because we can yet again obtain an alternate MAF by cutting edges e Bl and e B2 instead 
of ede es 6 ai and 6;, resolving {ox,cij}, cutting C( ai)(1 ^ instead of c ai if e ai G F, and otherwise cutting 
the same edges as in F. Thus, the call Maf(Fi, F 2 4 {e ai }, fe — 1, a 2 ) can be eliminated altogether. 
An analogous argument shows that we can eliminate the call Maf(Fi, F 2 4 {e a2 }, k — 1, ai). 

In Case |8.6[ Lemma [ll] and the inductive hypothesis show that one of the recursive calls 
Maf(Fi, F 2 4{e ai }, k-1, nil), Maf(F 1; F 2 4{e a2 }, k-1, nil), Maf(Fi, F 2 4-{e Bll , e Bl2 ,. . . , e Blsi }, k— 
si, nil) or Maf(Fi,F 2 4 {e B2 },k — l,nil) would return "yes". We need to show that the call 
Maf(Fi, F 2 4 {e a2 }, k — 1, nil) is not necessary. To see this, observe that, if I is a root, then cutting 
e a2 or e B2 has the same effect. If I is not a root but has a member aj of the current sibling group as 
a sibling, then we can obtain an alternate MAF by cutting e B2 instead of e a2 , resolving {a 2 ,aj}, 
cutting e( a2>a .) instead of e ai if a>i G F, and otherwise cutting the same edges as in F. 



In Case |8.7| finally, the correctness follows from Lemmas 10 and 11 if we can show that setting 
ao 7^ a i is correct for the second and fourth recursive calls. This, however, follows because, if neither 
the first nor the third recursive call returns "yes", then in every MAF F of Fi and F 2 , a\ exists and 
there exist two leaves a' x G F^ 1 and b\ G F 2 lj , for some 1 < j < sj, such that a\ b\. □ 



The proof of the following theorem is provided in Section S8 of the supplementary material 
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Theorem 3. Given two rooted X-trees T\ and Ti, a 3- approximation o/e(Ti, T2, T2) = dspn(Ti,T2) 
can be computed in O(nlogn) time. 



5 Conclusions 

We developed efficient algorithms for computing MAFs of multifurcating trees. Our fixed-parameter 
algorithm achieves the same running time as in the binary case and our 3-approximation algorithm 
achieves a running time of O(nlogra), almost matching the linear running time for the binary case. 
Implementing and testing our algorithms will be the focus of future work. 

Two other directions to be explored by future work are practical improvements of the running 
time of the FPT algorithm presented here and extending our FPT algorithm so it can be used to 
compute maximum acyclic agreement forests (MAAFs) and, hence, the hybridization number of 
multifurcating trees. To speed up our FPT algorithm for computing MAFs, it may be possible to 
extend the reduction rules used by Linz and Semple [16] for computing MAAFs of multifurcating 
trees so they can be applied to MAF computations, and combine them with the FPT algorithm in 
this paper. The fastest fixed-parameter algorithms for computing MAAFs of binary trees [Io[jl~2] are 
extensions of the binary MAF algorithms of Whidden et al. [8j[9]. These algorithms were developed 
by examining which search branches of the binary MAF algorithm get "stuck" with cyclic agreement 
forests and consider cutting additional edges to avoid these cycles |10| or refine cyclic agreement 
forests to acyclic agremeent forests |11[|12| . Similarly, Chen and Wang |21| recently extended the 
MAF fixed-parameter algorithm for two binary trees to compute agreement forests of multiple 
binary trees using an iterative branching approach. The proofs of various structural lemmas in this 
paper prove that any MAF can be obtained by cutting an edge set that includes certain edges. To 
prove this, we started with an arbitrary MAF and an edge set such that cutting these edges yields 
this MAF, and then we modified this edge set so that it includes the desired set of edges without 
changing the resulting forest. As in the binary case |12] , the same lemmas apply also to MAAFs; 
since the modifications in the proofs do not change the resulting AF, the only needed change in the 
proof is to start with an edge set such that cutting it yields an MAAF. Thus, numerous lemmas in 
this paper may also form the basis for an efficient algorithm for computing MAAFs. 

Finally, we note that our fixed-parameter algorithm becomes greatly simplified when comparing 
a binary tree to a multifurcating tree. This is common in practice when, for example, comparing 
many multifurcating gene trees to a binary reference tree or binary supertree. To see this, suppose 



F\ is binary, that is, m = 2 in every case of the FPT algorithm. Then only Cases 8.1, 8.2 and 8.3 



apply in Step [8] and Step [7] never applies. Using our observation that cutting es 2 is not necessary in 



Case 8.2 when m = 2, our algorithm becomes similar to the MAF algorithm for binary trees [9]. We 
further note, in the interest of practical efficiency, that cutting 02 is unnecessary in this case when 
the parent of a\ is a binary node (and, indeed, our algorithm is then identical to the algorithm of [9] 
when applied to two binary trees). 
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S6 Omitted Proofs 

56.1 Lemma [3] 

The only difference between F2 and F 2 is the expansion of {a p+ i, a p+2 , ■ ■ ■ , a m } in F 2 , so e{F\, F 2 ,F 2 ) < 
e{Fi,F 2 ,F! 2 ). Since F 4 E is an MAF of F 1 and F 2 , it suffices to show that F 4 E is an AF of F ± 
and F 2 to prove that e{F\, F 2 , F 2 ) < e(F\, F 2 , F 2 ). Assume the contrary. Then, since F 4- E is a 
forest of F\, it cannot be a forest of F 2 . Since the only difference between F 2 and F 2 is the expansion 
of {flp+i, Op+2, • • • , CL m }, this implies that some component of F 4- E contains leaves a\ 6 F 2 l and 
a'- £ F 2 \ for some 1 < i < p and p 4- 1 < j < m, contradicting that a\ ^f^e o' 7 - for all such leaves. 

56.2 Lemma [6] 

As in the proof of Theorem [TJ (ii) follows immediately from (i), so it suffices to prove that there 
exist a binary resolution F of F 2 and an edge set E of size e{T\,T 2 , F 2 ) such that F 4- E is an MAF 
of T\ and T 2 (and, hence, of F% and F 2 ) and En {e ai , e a2 , e^} 7^ 0. Once again, we show that, 
if E n {e Q1 , e a2 , e^} = 0, we can replace an edge / S E with an edge in {e ai , e a2 , e^} without 
changing F 4- E. 

This follows from the same arguments as in the proof of Theorem [T] unless there exist leaves 
a 'i e , a' 2 £ F 2 2 and b\ £ F 2 Bl such that a\ ^f^e Va x ^f~e o'\ and a! 2 ~f-^_b Pa 2 - I n this 
case, since ai is not a child of we have a 2 ^ F 2 Bl . Thus, since {01, 02} is a sibling pair in F\, F\ 
contains the triple a^a^lft^. Since F 2 contains the triple a^fe^a^ and a[ ^f+e b'i, this implies that 
a[ oo F ^_ E a 2 . Thus, we also have a 2 ^f+e x, for all x G F 2 \ F 2 2 , as otherwise the components of 
F 4- E containing a' 1; and a 2 , x would overlap in F\. We choose an arbitrary leaf b 2 G B 2 and the 
first edge f £ E on the path from p a2 to b 2 . Lemma [T] implies that F J ^E = F J ^(E\ {/} U {e a2 }). 
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S6.3 Lemma [10] 



As in the proof of Lemma [9j we prove this using induction on s = s± + S2- The base case is s = 2 
and, hence, s\ = S2 = 1 because r > 2 implies that s\ > and S2 > 0. In this case, Theorem [T] 
proves the lemma. So assume s > 2 and the claim holds for all 1 < s' < s. By Theorem [TJ there 
exist a resolution F of F2 and an edge set E of size e(T\, T2, F2) such that F -f- E is an AF of F% 
and -F2 and E n {e ttl3 e a2 , e^u, e_B 2 il 7^ 0- Using an inductive argument as in the proof of Lemma [9j 
it follows that there exist a resolution F' of F2 and an edge set E' satisfying the lemma. 



S6.4 Lemma [TT] 

First consider the case when S2 = 0, which is possible because r = 2. Then B' 2 = B2 and, by 
Theorem [TJ there exist a resolution F of F2 and an edge set E of size e(Ti, T2, T2) such that F ^ E 
is an AF of Ti and T2 and F n {e ai , e a2 , es u , ee 2 } / 0- If F n {e ai , e a2 , ee 2 } 7^ 0, the lemma holds. 
Otherwise ee n G -E and an inductive argument similar to the one in the proof of Lemma [9] proves 
the lemma. 



If S2 > 0, we observe that the proof of Lemma 10 did not use the assumption that r > 2 but 
only that it implies S2 > 0. Hence, this proof shows that there exist a resolution F of F2 and an 
edge set E of size e(Ti, T2, T2) such that F 4- E is an AF of T\ and T2 and E n {e ai ,e Q2 } 7^ 0, 
{eBn,e Sl ,...,e B }CBor {es 21 ,es 22 , . . . ,eB 2a2 } C E 1 . Among all such resolutions and edge sets, 
choose F and E so that E n {e ai , e a2 } / or {eB u , ee 12 , . . . , es lsi } C E" if possible. If we can find 
such a pair (F,E), the lemma holds. Otherwise {eB 21 ,eB 22 , . . . ,eB 2s2 } Q E. Let i 7 ' be the forest 
obtained from F2 by resolving i?2i, -622, ••• j -62^2 and cutting edges ee 21 , es 22 , es 2s . In F', we 
have r = 2 and S2 = 0. Hence, by the argument in the previous paragraph, there exist a resolution 
F" of F' and an edge set E" of size e(T 1 ,T 2 ,F') such that F" 4 E" is an AF of T 1 and T 2 and 
E" n {e ai ,e a2 } 7^ 0) {eBnje_B 12 , . . . , eBi Sl } ^ E" or G -E"'. In the first two cases, we obtain a 
contradiction to the choice of F and E. In the latter case, the set {es 21 , es 22 , • • • , e B 2a2 } ^ ^" ^ as 
size S2 + e(Ti,T 2 ,F") = e(Ti,T 2 ,F 2 ), F 2 4 ({e_e 21 , es 22 , . . . , es 2s2 } U £") is an AF of Ti and T 2 , and 
{es 21 , es 22 , . . . , e.B 2a2 ' e B 2 } { e -B2i > e B22 5 • • • > e s 2s2 } U Thus, the lemma holds in this case as 
well. 



S7 Linear Time Per Invocation 

We represent each forest as a collection of nodes, each of which points to its parent, to its leftmost 
child, and to its left and right siblings. This allows us to cut an edge in constant time, given the 
parent and child connected by this edge. Every labelled node (i.e., every node in Rt or Rd) stores a 
pointer to its counterpart in the other forest. For Ti, we maintain a list of sibling groups of labelled 
nodes. For each such group, the list stores a pointer to the parent of the sibling group, which allows 
us to access the members of the sibling group by traversing the list of the parent's children. To 
detect the creation of such a sibling group, and add it to the list, each internal node of T\ stores 
the number of its unlabelled children. When labelling a non-root node, we decrease its parent's 
unlabelled children count by one. If this count is now 0, the children of this parent node form a new 
sibling group, and we add a pointer to the parent to the list of sibling groups. For F2, we maintain 
a list R' d C R t of labelled nodes that are roots of F2 . This list is used to move these roots from Rt 
to R d . 

Steps [TJjl] are implemented similarly to the algorithm for binary trees [9j. Step [JJ clearly takes 
constant time. In Step[2j we can test in constant time whether \Rt\ < 1 by inspecting at most two 
nodes in the first two sibling groups. Step [3] takes constant time to test whether the root list R' d 
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is empty and, if it is not, cut the appropriate edge in T\ and update a constant number of lists 
and pointers. Step [4] takes contant time using the list of sibling groups. We always choose the 
next sibling group from the beginning of this list and append new sibling groups to the end. This 
automatically gives preference to the most recently chosen sibling group as required in Step|4} 

Step [5] requires some care to implement efficiently We iterate over the members 0,1,0,2, . . . ,a m 
of the current sibling group and mark their parents in F2. Initially, all nodes in F2 are unmarked. 
When inspecting a node ctj whose parent p ai in F% is unmarked, we mark p ai with cij. If p ai is 
already marked with a node a.j (j < i), then aj and a,j are siblings in T\ and F2. We resolve them 
in constant time and mark p ai (which is now the grandparent of en) with the new parent (aj,o,) of 
Oj and aj. Since we spend constant time per member a, of the sibling group, this procedure takes 
0(m) time. Once it finishes, the remaining members of the sibling group are not siblings in F2. 
Performing a contraction if the remaining sibling group has only one member takes constant time. 

In Step [HJ we perform a linear-time traversal of F2 to label every node x with the number r x 
of members of the current sibling group among its descendants. Then, if the previously chosen 
minimal LCA I still exists in F2 and has at least two descendants in the current sibling group, we 
keep this choice of /. Otherwise a node x is a minimal LCA of a subset of the current sibling group 
if and only if r x > 2 and r y < 1, for each child y of x. If there is no such node x, we proceed to 
Step [7] without choosing I because a, 00 p 2 aj, for all 1 < i < j < m. Otherwise we pick any node x 
satisfying this condition as the new minimal LCA I. No matter whether I is the previously chosen 
minimal LCA or a new node, we set r = ri and traverse the paths from I to its descendant members 
of the sibling group, a%, 02, . . . , a r . We do this by visiting all descendants y of I such that r y = 1. 
For all 1 < i < r, the length of the path from I to aj, excluding I and aj, is Sj. We sort a±, 02, ■ ■ ■ , a r 
by their path lengths si, S2, ■ ■ ■ ,s r using Counting Sort |22|. Since Si < n, for all 1 < i < r, this 
takes linear time. 

To distinguish between Steps [7] and |8j it suffices to examine do- We distinguish between the 
cases in Steps [7] and [8] using the values of r, m, and s\,S2, ■ ■ ■ ,s r and, in Step[7j by testing whether 
ao is among the descendants of I. In each case, we can easily copy the forests, cut the appropriate 
edges, and update our lists and pointers in linear time for each of the recursive calls. 

To summarize: Each execution of Steps [T]-[4] takes constant time. Step [T] is executed once per 
invocation. Steps [2}|4] are executed at most a linear number of times per invocation because each 
execution, except the first one, is the result of finding a root of F2 in Step [3] or resolving sibling 
pairs in Step[5j both of which can happen only O(n) times. Each execution of Step [5] takes O(m) 
time. In a given invocation, Step [5] is executed at most once per sibling group (because we either 
proceed to Step [6] or return to Step [2] after completely resolving the sibling group). Thus, since 
the total size of all sibling groups is bounded by |T\|, the total cost of all executions of Step [5] per 
invocation is O(n). Steps [6j|8] are executed at most once per invocation and take linear time. Thus, 
each invocation of the algorithm takes linear time. 

S8 A 3- Approximation Algorithm for 
Rooted MAF 

We now show how to modify the FPT algorithm from Section [4] to obtain a 3-approximation 
algorithm with running time O (n log n) . This algorithm is easy to implement iteratively, and this 
may be preferable in practice. In order to minimize the differences to the FPT algorithm, however, 
we describe it as a recursive algorithm. There are four differences to the FPT algorithm: 

• Instead of deciding whether e{T\,T2, F2) < k, an invocation MAF(i ? i, F2) returns an integer 
k" such that e{T\,T2, F2) < k" < 3e(Ti, T2, F2). Thus, there is no need for a parameter k to 
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the invocation or for an equivalent of Step [T] of the FPT algorithm, and whenever Step [2] of 
the FPT algorithm would have returned "yes" , we now return as our approximation k" of 
e(Ti, T2, F2) because F2 is an AF of T\ and T2. 

We execute Step [5] only if the immediately preceding execution of Step [4] chose a new sibling 
group. This ensures that this step is executed only once per sibling group. As discussed in 



Section S7, the cost per sibling group {01,02, ■ . . ,o m } is O(m). Thus, the total cost of all 



executions of Step [5] in all recursive invocations is O(n). After executing Step[5| implemented 



as discussed in Section S7 every node p in F2 has at most one node <Zj as a child, which is 
stored as p's representative r p . This allows us to merge sibling pairs {oj, Oj} that arise as a 
result of edge cuts after we executed Step [5] without re-executing this step: Whenever we 
contract a degree-2 vertex whose only child is a node Oj in the current sibling group and 
whose parent is p, we set r p := a« if r p = nil; otherwise we resolve the sibling pair {oj, r p } as 
in Step [5] and store (oj,r p ) as p's new representative. 

We do not execute Step [6j as this would require linear time per invocation. Instead, we assign 
a depth estimate d x to each node x and use it to choose the order in which to inspect the 
members of the current sibling group. Initially, d x is x's depth in T2, which is easily computed 
in linear time for all nodes x G I2. In general, d x is one more than the depth of p x in T2, where 
p x is x's parent in F2. In particular, d x is an upper bound on x's depth in F2 and, for two 
nodes x and y with LCA I, we have d y > d x if x is a child of I and y is not. When choosing a 
new sibling group in Step [4], we insert all group members into a max-priority queue Q, with 
their depth estimates as their priorities. When contracting the degree-2 parent p x of a node x, 
we set d x := d Px . If x is a member of the current sibling group, we update its priority in Q. 
When resolving a sibling pair {oj,Oj}, we remove Oj and a,j from Q, set d( 0i ,o ) := d ai , and 
insert (oj, Uj) into Q. Finally, when cutting an edge e ai , for a member a% of the current sibling 
group, we remove a, from Q. These updates take O(logn) time per modification of F2. Since 
we modify F2 at most O(n) times, the total cost of all priority queue operations is O(nlogn). 

We do not distinguish between Steps [7] and [8] (having no concept of ao) and also do not 
distinguish between the various cases of the steps. Instead, we have a single Step [7] with four 
cases that each make one recursive call (see Figure p7j). 



7.1. If m = 2 and this sibling group was chosen in Step [4] of the current invocation, make one 
recursive call Maf(Fi, F2 -r- {e ai , e Pai , e a2 }) and return 3 plus its return value. 

7.2. If m = 2 and this sibling group was chosen in Step [4] of a previous invocation, make one 
recursive call Maf(Fi, F2 -r- {e ai , e Pai , e Q2 , e Pa2 }) and return 4 plus its return value. 

7.3. If m > 2 and this sibling group was chosen in Step [4] of the current invocation, let a\ 
and 02 be the two entries with maximum priority in Q, ordered so that d ai > d a2 . If 
a2's parent has a single sibling, this sibling is a member aj of the current sibling group, 
and ai's parent is either a root or has a sibling that is not a member of the current 
sibling group, let x = 2; otherwise let x = 1. Remove a x from Q, make one recursive call 
MAF(i ? i, F2 -T- {e ax , e Pax }), and return 2 plus its return value. 

7.4. If m > 2 and this sibling group was chosen in Step [4] of a previous invocation, delete the 
node 01 with maximum priority from Q, make one recursive call MAF(iq, F2 J tr{e ai , e Pa }), 
and return 2 plus its return value. 

Using these modifications, we obtain the following theorem. 
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(a) m = 2: A single application of Case 7.1 resolves the sibling group {01,02}. 




(b) m = 4: A single application of Case 7.3 resolves the sibling group {01,02,03,04}. 




(c) The set of edges cut on the same input if we did not give preference to 02 in Case 7.3. Since cutting edge es 2 makes the 
sibling group agree with f\ in this case, this would not give a 3-approximation. 




(d) m = 4: The optimal solution needs to cut 2 edges, while our algorithm cuts 6. 




(e) The sequence of cuts on the same input if we did not give preference to 02 in Case 7.3. Note that we obtain the same forest. 




Figure S7: Illustration of the various cases of Step [7] of the approximation algorithm. Only F2 is 
shown. Edges that are cut in each step are shown in bold. 



Theorem 3. Given two rooted X-trees T\ and T2, a 3-approximation o/e(Ti, T%, T2) = dspn{Ti,T2) 
can be computed in O(nlogn) time. 



Proof. We use the algorithm just described. This algorithm consists of Steps [2]-[5] of the FPT 
algorithm plus the modified Step [7J above. In addition, there is a linear-time preprocessing step for 
computing the initial depth estimates of all nodes of T%. We argued already that all executions of 
Step [5] take linear time in total. In Section S7 we argued that each execution of Step [2} [3] or [4] of 



the FPT algorithm takes constant time. In the approximation algorithm, if Step [4] chooses a new 
sibling group, it also needs to insert the members of the sibling group into the priority queue. This 
takes O(mlogm) time, O(nlogn) in total for all sibling groups. Each execution of Step [TJ takes 
O(logn) time, constant time for the modifications of F2 it performs and O(logra) time for the 0(1) 
corresponding priority queue operations. Thus, to obtain the claimed time bound of O(nlogn) for 
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the entire algorithm, it suffices to show that each step of the algorithm is executed O(n) times. 

This is easy to see for Steps [3j [5] and [7| Each execution of Step [5] reduces the number of nodes 
in Rt by one, the number of nodes in Rt never increases, and initially Rt contains the n leaves of T\. 
Each execution of Step [3] or [7] cuts at least one edge in F\ or F2 , and initially these two forests have 
O(n) edges. 

For Steps [2] and |4j we observe that they cannot be executed more often than Steps |3j [5j and [7] 
combined because any two executions of Step [2] or [4] have an execution of Step [3j [5] or [7] between 
them. 

It remains to bound the approximation ratio of the algorithm. First observe that the value 
kl returned by the algorithm satisfies k' > e(Xi,T2,T2) because the input forest F2 of the final 
invocation Maf(-Fi, F2) is an AF of T\ and T2 and the algorithm returns the number of edges cut to 
obtain F2 from T2. To prove that k! < 3e(Ti, T2, T2), let r be the number of descendant invocations 
of the current invocation M.af(Fi, F2), not counting the current invocation itself, and let k" be 
its return value. We use induction on r to prove that k" < 3e(Ti, T2, F2) if Maf^i,/^) chooses 
a new sibling group in Step [4| We call such an invocation a master invocation. We also consider 
the last invocation of the algorithm to be a master invocation. For two master invocations without 
another master invocation between them, all invocations between the two invocations are slave 
invocations of the first of the two master invocations, as they manipulate the sibling group chosen 
in this invocation. 

As a base case observe that, if r = 0, then e(Xi, T2, .F2) = and the invocation MAF(i ? i, F2) 
returns k" = in Step[2j For the inductive step, consider a master invocation X = M.af(F\, F2) 
with r > 0, and let X' = M.af(F[, F^) be the first master invocation after X. By the inductive 
hypothesis, X' returns a value k'" such that k'" < 3e(Ti, T2, F^)- 
If m = 2 in invocation X, we invoke Case 



7.1 



which cuts edges e ai , e Pai , and e a2 and, thus, 
makes the sibling group agree between F\ and F2 (see Figure [S7^ a)). This implies that k" = k!" + 3. 
By Lemma [6| we have e(Ti,T2,F2) < e(T\,T2, F2) — 1. Since k!" < 3e(Ti, T2, F 2 ), this shows that 
fc"<3e(Ti,T 2 ,F 2 ). 



If m > 2 in invocation X, this invocation applies Case 7.3, each of its slaves applies Case 7.2 
or 7.4, and Case 7.2 is applied at most once. Since the sibling group {01, 02, ... , a m } does not agree 



between F\ and F%, at least one edge cut in F2 is necessary to make the sibling group agree between 
F\ and F2. We distinguish whether one or more edge cuts are required. 

If one cut suffices (Figures S7^b) and|S7|c)), then there are at most two components of F2 that 
contain members of the current sibling group because resolving overlaps in F\ between q components 
of F2 requires at least q — 1 cuts in F2. Consequently, exactly one component C contains at least 
two members of the sibling group. The existence of at least one such component follows because 
m > 2. If we had another component C containing at least two members of the current sibling 
group, then at least one cut would be required in each of C and C to make the sibling group agree 
between F\ and F2, but we assumed that one cut suffices. 

For a single cut to suffice to make the current sibling group agree between F± and F2, C must 
consist of a single path of nodes xx,X2, ■ ■ ■ , Xt such that, for 1 < j < t, Xj has two children: Xj + \ 
and a member of the current sibling group; xt has a member aj t of the current sibling group as 
a child, as well as a group Bi t of siblings of Oj t such that no member ah of the current sibling group 

B ■ 

belongs to F 2 H . Thus, cutting edg es c ai and &p a . makes the sibling group agree between F\ and F^. 
Now observe that dj t and a{ t _ 1 are the two members of the current sibling group with the greatest 
depth estimates in C. If there exists another component C of F2 that contains a member of 
the current sibling group, then is the only such node in C . Thus, the two maximum priority 
entries in Q are either and at t _ 1 or and ah- In both cases, invocation X cuts edges e ai and 
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e Pa . because either has no parent or its parent does not have a member of the current sibling 
group as a sibling, and a% t is preferred over Oj t „ 1 by invocation X because a^'s parent does have 
a member of the current sibling group as its only sibling (namely a>i t _ 1 ) and has a greater depth 
estimate than aj t _ 1 . Thus, in X's child invocation, the current sibling group agrees between F± and 
F2, which implies that this child invocation is X' and k" = k'" + 2. Moreover, since the sibling group 
does not agree between Fx and F2 in invocation X, we must have e(Ti, T 2 , i^) < e(Tx,T2, F2) — 1 
and, hence, k" = k'" + 2 < 3e(T x , T 2 , F%) + 2 < 3e(Ti, T 2 , F 2 ). 

For the remainder of the proof assume at least two cuts are necessary to make the sibling group 
{ax, 0,2-, ■ ■ • j »m) agree between Fx and F2 (Figures [S7)^d) and |S7[e)), and assume the members of 
the sibling group are ordered by their depth estimates. Let ii,«2> • • • ,is be the indices such that 
invocation X and its slaves cut edges {e ai . ,e Pa . | 1 < j < s} and, for all 1 < j < s, let Bi- be the 
set of aj . 's siblings at the time we cut edges e ai and e Pa . . Assume for now that invocation X cuts 

3 B j 

ed ges e ai and Cp ai • Then, for all 1 ^ j ' ^ s, i*^ J contains no member of the current sibling group 
because such a member would have a greater depth estimate than . and hence e Qh would have 
been cut before e ai . . This implies that there exists an edge set E such that F2 -5- -E7 is an MAF of i 7 ! 
and F2 and |-E Pi {e ai .,es 4 . | 1 < j < s}\ > s — 1. Since cutting edges e Qi . and e^. produces the 
same result as cutting edges e aj . and e Pa . , this shows that e(Tx,T2, Fg) < e(Tx,T2, F2) — (s — 1). 

If s > 3, we have 2s < 3(s - 1) and,' hence, fe" < fc'" + 3(s - 1) < 3(e(Tx,T 2 , F!,) + s - 1) < 
3e(Ti, T2, F2). If s < 3, we observe that e(Ti, T2, F2) < e(Ti,T 2 ,F 2 ) — 2 because the current sibling 
group agrees between Fx and F' 2 and we assumed that at least two cuts are necessary in F 2 to make 
this sibling group agree between F x and F 2 . Thus, k" < k'" + 2s < 3e(Tx,T 2 , F^) + 4< 3e(Tx,T 2 , F 2 ). 

It remains to deal with the case when invocation X cuts edges e a2 and e Pa . If ax ^ F% 2 ! , 

Si 2 
then again J contains no member of the current sibling group, for all 1 < j < s, and the same 

argument as above shows that k" < 3e(Ti, T 2 , i^)- If «i G F^ 2 , then observe that either the path 

in F2 between ax and p a2 has at least two internal nodes or a 2 has at least two siblings because 

otherwise invocation X would prefer ax over a 2 because ax has the greater depth estimate. This 

implies that a 2 has the same parent before and after cutting edges e ai and e Pai , and ax has the 

same parent before and after cutting edges e a2 and e Pa2 . Thus, X and its child invocation cut the 

same four edges e ai , e Pai , e Q2 , and e Pa2 as would have been cut if invocation X had not preferred a 2 

over ax- The same argument as above now shows that k" < 3e(Ti, T 2 , F 2 ). □ 
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